Search CORE

6 research outputs found

SQ Lower Bounds for Learning Bounded Covariance GMMs

Author: Diakonikolas Ilias
Kane Daniel M.
Pittas Thanasis
Zarifis Nikos
Publication venue
Publication date: 22/06/2023
Field of study

We study the complexity of learning mixtures of separated Gaussians with common unknown bounded covariance matrix. Specifically, we focus on learning Gaussian mixture models (GMMs) on

\mathbb{R}^d

of the form

P= \sum_{i=1}^k w_i \mathcal{N}(\boldsymbol \mu_i,\mathbf \Sigma_i)

, where

\mathbf \Sigma_i = \mathbf \Sigma \preceq \mathbf I

and

\min_{i \neq j} \| \boldsymbol \mu_i - \boldsymbol \mu_j\|_2 \geq k^\epsilon

for some

\epsilon>0

. Known learning algorithms for this family of GMMs have complexity

(dk)^{O(1/\epsilon)}

. In this work, we prove that any Statistical Query (SQ) algorithm for this problem requires complexity at least

d^{\Omega(1/\epsilon)}

. In the special case where the separation is on the order of

k^{1/2}

, we additionally obtain fine-grained SQ lower bounds with the correct exponent. Our SQ lower bounds imply similar lower bounds for low-degree polynomial tests. Conceptually, our results provide evidence that known algorithms for this problem are nearly best possible

arXiv.org e-Print Archive

Estimating the Number of Induced Subgraphs from Incomplete Data and Neighborhood Queries

Author: Fotakis Dimitris
Pittas Thanasis
Skoulakis Stratis
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 18/05/2021
Field of study

We consider a natural setting where network parameters are estimated from noisy and incomplete information about the network. More specifically, we investigate how we can efficiently estimate the number of small subgraphs (e.g., edges, triangles, etc.) based on full access to one or two noisy and incomplete samples of a large underlying network and on few queries revealing the neighborhood of carefully selected vertices. After specifying a random generator which removes edges from the underlying graph, we present estimators with strong provable performance guarantees, which exploit information from the noisy network samples and query a constant number of the most important vertices for the estimation. Our experimental evaluation shows that, in practice, a single noisy network sample and a couple of hundreds neighborhood queries suffice for accurately estimating the number of triangles in networks with millions of vertices and edges

Association for the Advancement of Artificial Intelligence: AAAI Publications

Streaming Algorithms for High-Dimensional Robust Statistics

Author: Diakonikolas Ilias
Kane Daniel M
Pensia Ankit
Pittas Thanasis
Publication venue: eScholarship, University of California
Publication date: 01/01/2022
Field of study

We study high-dimensional robust statistics tasks in the streaming model. A recent line of work obtained computationally efficient algorithms for a range of high-dimensional robust estimation tasks. Unfortunately, all previous algorithms require storing the entire dataset, incurring memory at least quadratic in the dimension. In this work, we develop the first efficient streaming algorithms for high-dimensional robust statistics with near-optimal memory requirements (up to logarithmic factors). Our main result is for the task of high-dimensional robust mean estimation in (a strengthening of) Huber's contamination model. We give an efficient single-pass streaming algorithm for this task with near-optimal error guarantees and space complexity nearly-linear in the dimension. As a corollary, we obtain streaming algorithms with near-optimal space complexity for several more complex tasks, including robust covariance estimation, robust regression, and more generally robust stochastic optimization

arXiv.org e-Print Archive

eScholarship - University of California

Recommended from our members

Statistical Query Lower Bounds for List-Decodable Linear Regression

Author: Diakonikolas Ilias
Kane Daniel M
Pensia Ankit
Pittas Thanasis
Stewart Alistair
Publication venue: eScholarship, University of California
Publication date: 01/01/2021
Field of study

eScholarship - University of California

Robust Sparse Mean Estimation via Sum of Squares

Author: Diakonikolas Ilias
Kane Daniel M.
Karmalkar Sushrut
Pensia Ankit
Pittas Thanasis
Publication venue
Publication date: 07/06/2022
Field of study

We study the problem of high-dimensional sparse mean estimation in the presence of an

\epsilon

-fraction of adversarial outliers. Prior work obtained sample and computationally efficient algorithms for this task for identity-covariance subgaussian distributions. In this work, we develop the first efficient algorithms for robust sparse mean estimation without a priori knowledge of the covariance. For distributions on

\mathbb R^d

with "certifiably bounded"

t

-th moments and sufficiently light tails, our algorithm achieves error of

O(\epsilon^{1-1/t})

with sample complexity

m = (k\log(d))^{O(t)}/\epsilon^{2-2/t}

. For the special case of the Gaussian distribution, our algorithm achieves near-optimal error of

\tilde O(\epsilon)

with sample complexity

m = O(k^4 \mathrm{polylog}(d))/\epsilon^2

. Our algorithms follow the Sum-of-Squares based, proofs to algorithms approach. We complement our upper bounds with Statistical Query and low-degree polynomial testing lower bounds, providing evidence that the sample-time-error tradeoffs achieved by our algorithms are qualitatively the best possible.Comment: To appear in COLT 202

arXiv.org e-Print Archive

List-Decodable Sparse Mean Estimation via Difference-of-Pairs Filtering

Author: Diakonikolas Ilias
Kane Daniel M.
Karmalkar Sushrut
Pensia Ankit
Pittas Thanasis
Publication venue
Publication date: 10/06/2022
Field of study

We study the problem of list-decodable sparse mean estimation. Specifically, for a parameter

\alpha \in (0, 1/2)

, we are given

m

points in

\mathbb{R}^n

\lfloor \alpha m \rfloor

of which are i.i.d. samples from a distribution

D

with unknown

k

-sparse mean

\mu

. No assumptions are made on the remaining points, which form the majority of the dataset. The goal is to return a small list of candidates containing a vector

\widehat \mu

such that

\| \widehat \mu - \mu \|_2

is small. Prior work had studied the problem of list-decodable mean estimation in the dense setting. In this work, we develop a novel, conceptually simpler technique for list-decodable mean estimation. As the main application of our approach, we provide the first sample and computationally efficient algorithm for list-decodable sparse mean estimation. In particular, for distributions with ``certifiably bounded''

t

-th moments in

k

-sparse directions and sufficiently light tails, our algorithm achieves error of

(1/\alpha)^{O(1/t)}

with sample complexity

m = (k\log(n))^{O(t)}/\alpha

and running time

\mathrm{poly}(mn^t)

. For the special case of Gaussian inliers, our algorithm achieves the optimal error guarantee of

\Theta (\sqrt{\log(1/\alpha)})

with quasi-polynomial sample and computational complexity. We complement our upper bounds with nearly-matching statistical query and low-degree polynomial testing lower bounds

arXiv.org e-Print Archive